Home | | Latest | About | Random
# 43 Orthogonal projection and best approximation theorem.
🚧🚧(need to add figures)🚧🚧
## Orthogonal decomposition and orthogonal projection.
We now turn to the following problem:
Fix a subspace $W$ of $\mathbb R^{n}$, for any vector $x \in \mathbb R^{n}$, can we decompose $x$ into a sum of two parts $x = x_{||}+ x_{\perp}$, where $x_{||} \in W$ and $x_{\perp} \in W^{\perp}$?
The expression $x = x_{||}+x_{\perp}$ is the _orthogonal decomposition_ of $x$ with respect to the subspace $W$, and we call $x_{||}$ the _orthogonal projection_ of $x$ onto $W$, and $x_{\perp}$ _the component $x$ orthogonal to $W$_.
The answer is yes, such orthogonal decomposition exists, and we will construct this orthogonal decomposition, provided we believe $W$ has an orthogonal basis.
Let us assume that $u_{1},\ldots ,u_{k}$ is an orthogonal basis for $W$. Then since $x_{||}$ is desired to be in $W$, we have some linear combination $$
x_{||} = c_{1}u_{1}+\cdots +c_{k}u_{k}
$$But what are these coefficients $c_{i}$? Since $x_{\perp} = x-x_{||}$ is desired to be in $W^{\perp}$, we must have $x_{\perp}$ to be orthogonal to each of the $u_{i}$ basis vectors of $W$. So for each $i$, we have $$
u_{i}\cdot x_{\perp} = u_{i} \cdot (x-x_{||}) = 0
$$This implies that for each $i$ we have $$
\begin{array}{}
u_{i}\cdot x = u_{i} \cdot x_{||} =u_{i}(c_{1}u_{1}+\cdots c_{k}u_{k}) = c_{i}u_{i}\cdot u_{i} \\
\displaystyle \implies c_{i} = \frac{u_{i}\cdot x}{u_{i}\cdot u_{i}}
\end{array}
$$which gives us the coefficients $c_{i}$ !
And to verify $x_{\perp} = x - x_{||}$ is indeed in $W^{\perp}$, for each basis vector $u_{i}$ of $W$, we compute the dot product $$
\begin{align*}
u_{i} \cdot x_{\perp} &= u_{i}\cdot (x-x_{||}) \\
&= u_{i} \cdot x - u_{i} \cdot x_{||} \\
&= u_{i}\cdot x - u_{i}\cdot(c_{1}u_{1}+\cdots +c_{k}u_{k}) \\
&= u_{i}\cdot x - c_{i}(u_{i}\cdot u_{i}) \\
&= u_{i}\cdot x - \frac{u_{i}\cdot x}{u_{i}\cdot u_{i}}(u_{i}\cdot u_{i}) \\
&= 0.
\end{align*}
$$Hence indeed, $x_{\perp}$ is in $W^{\perp}$.
So we have
> **Orthogonal decomposition with respect to a subspace $W$.**
> If $u_{1},\ldots,u_{k}$ is an orthogonal basis of $W$, then the orthogonal decomposition of $x$ respect to $W$ is $$
\begin{array}{l}
x = x_{||} + x_{\perp} \quad\text{with }\ \ x_{||} \in W\ \ \ ,\ \ \ x_{\perp} \in W^{\perp} \\
\displaystyle \text{where }\ \ x_{||} =\left( \frac{u_{i}\cdot x}{u_{1}\cdot u_{1}} \right) u_{1} +\cdots \left( \frac{u_{k}\cdot x}{u_{k}\cdot u_{k}} \right) u_{k}
\end{array}
$$We say $x_{||}$ is the _orthogonal projection of $x$ onto $W$_, and $x_{\perp}$ the _component of $x$ orthogonal to $W$_.
Now it may seem like the orthogonal decomposition of $x = x_{||}+x_{\perp}$ with respect to $W$ depends on the the choice of the orthogonal basis for $W$, but it actually does not. This decomposition is unique:
> **Uniqueness of orthogonal decomposition with respect to a subspace $W$.**
> Let $W$ be a subspace of $\mathbb R^{n}$. If the orthogonal decomposition of $x$ with respect to $W$ exists, that $x= x_{||}+x_{\perp}$ where $x_{||} \in W$ and $x_{\perp}\in W^{\perp}$, then it is unique.
$\blacktriangleright$ Proof. Suppose one can orthogonally decompose a vector $x$ with respect to subspace $W$ in two ways, $$
x = x_{||}+x_{\perp} \quad\text{and}\quad x=x_{||}'+ x_{\perp}'
$$where $x_{||},x_{||}' \in W$ and $x_{\perp},x_{\perp}' \in W^{\perp}$ . Then we have equality $$
\begin{align*}
x_{||}+x_{\perp} = x_{||}'+x_{\perp}' \\
\implies \underbrace{x_{||}-x_{||}'}_{\in W} = \underbrace{x_{\perp}'-x_{\perp}}_{\in W^{\perp}}
\end{align*}
$$Note the quantity $x_{||}-x_{||}'$ is in $W$ but equals to an element in $W^{\perp}$. This means $$
x_{||}-x_{||}' \in W\cap W^{\perp}
$$But the intersection $W\cap W^{\perp} = \{\vec0\}$, so we have $x_{||}-x_{||}'=\vec 0$, namely $x_{||}=x_{||}'$, and hence also $x_{\perp}=x_{\perp}'$. In other words, the orthogonal decomposition $x=x_{||}+x_{\perp}$ with respect to $W$, where $x_{||}\in W$ and $x_{\perp}\in W^{\perp}$ is unique. $\blacksquare$
Let us write the notation $$
\text{proj}_{u}(x) = \left( \frac{u \cdot x}{u \cdot u} \right) u
$$to be the orthogonal projection of $x$ onto the line spanned by $u$. And let us also write the notation $$
\text{proj}_{W} (x) = x_{||}
$$Then we can restate the orthogonal projection of $x$ onto $W$ as follows:
> Let $W$ be a subspace with an orthogonal basis $u_{1},u_{2},\ldots,u_{k}$, then the orthogonal projection of $x$ onto $W$ is $$
\text{proj}_{W}(x) = \text{proj}_{u_{1}}(x)+ \text{proj}_{u_{2}}(x) + \cdots + \text{proj}_{u_{k}}(x)
$$where $$
\text{proj}_{u}(x) = \left( \frac{u \cdot x}{u \cdot u} \right) u.
$$That is, the orthogonal projection of a vector $x$ onto a subspace $W$ is the sum of orthogonal projections of $x$ onto each basis vector of an orthogonal basis of $W$.
**Remark.** It is important that $u_{1},u_{2},\ldots,u_{k}$ is an orthogonal basis of $W$ when using above formula to compute the orthogonal projection. Otherwise you would have gotten something incorrect! And using a process called _Gram-Schmidt process_, we can turn any basis of $W$ into an orthogonal basis, which we will see later. Hence, orthogonal basis for a subspace $W$ in $\mathbb R^{n}$ exists, and therefore one can always perform orthogonal decomposition of a vector with respect to $W$.
**Example.**
Consider $W =\text{span}\{\begin{pmatrix}1\\1\\2\end{pmatrix},\begin{pmatrix}-3\\1\\1\end{pmatrix}\}$. Let $x = \begin{pmatrix}1\\1\\1\end{pmatrix}$, and find $\text{proj}_{W}(x)$, the orthogonal projection of $x$ onto $W$, also compute $x_{\perp}$, the component of $x$ orthogonal to $W$. Check that $x_{\perp}$ and $\text{proj}_{W}(x)$ are actually orthogonal to each other.
$\blacktriangleright$ First we check if we have an orthogonal basis for $W$, this is important. Indeed $$
u_{1} = \begin{pmatrix}1\\1\\2\end{pmatrix},u_{2}=\begin{pmatrix}-3\\1\\1\end{pmatrix}
$$is an orthogonal basis set for $W$. So we can directly compute $$
\begin{align*}
\text{proj}_{W}(x)& =\text{proj}_{u_{1}} x + \text{proj}_{u_{2}}(x) \\
&=\frac{u_{1} \cdot x}{u_{1}\cdot u_{1}} u_{1}+ \frac{u_{2}\cdot x}{u_{2}\cdot u_{2}}u_{2} \\
&= \frac{4}{6} \begin{pmatrix}1\\1\\2\end{pmatrix} + \frac{-1}{11} \begin{pmatrix}-3\\1\\1\end{pmatrix} \\
\implies \text{proj}_{W}(x)& = \begin{pmatrix}31 / 33 \\
19 / 33 \\
41 / 33\end{pmatrix}.
\end{align*}
$$
And $x_{\perp}$, the component of $x$ orthogonal to $W$ is just $x -\text{proj}_{W}(x)$, namely $$
x_{\perp} = x - \text{proj}_{W}(x) = \begin{pmatrix}1\\1\\1\end{pmatrix}-\begin{pmatrix}31 / 33 \\
19 / 33 \\
41 / 33\end{pmatrix}= \begin{pmatrix}2 / 33 \\
14 / 33 \\
-8 / 33\end{pmatrix}.
$$
And indeed, if we compute $$
\text{proj}_{W}(x) \cdot x = \begin{pmatrix}31 / 33 \\
19 / 33 \\
41 / 33\end{pmatrix} \cdot \begin{pmatrix}2 / 33 \\
14 / 33 \\
-8 / 33\end{pmatrix} = \frac{62+ 266-328}{(33)^{2}} = 0
$$so $\text{proj}_{W}(x)$ is orthogonal to $x_{\perp}$ as expected. $\blacklozenge$
**Example.** (A simple case of finding an orthogonal basis for a 2 dimensional subspace.)
Let $W =\text{span}\{\begin{pmatrix}1\\1\\1\end{pmatrix},\begin{pmatrix}1\\2\\3\end{pmatrix} \}$, and a vector $x = \begin{pmatrix}3\\2\\1\end{pmatrix}$.
(a) Find an orthogonal basis for $W$.
(b) Compute $\text{proj}_{W}(x)$.
$\blacktriangleright$
(a) Note the two vectors $w_{1} = \begin{pmatrix}1\\1\\1\end{pmatrix},w_{2}=\begin{pmatrix}1\\2\\3\end{pmatrix}$ that span $W$ are not orthogonal to each other. But one can make a new basis $u_{1},u_{2}$ for $W$ that is orthogonal.
First let us fix $u_{1} = w_{1}$. Now, $u_{1}$ is not orthogonal to $w_{2}$. But what we can do is by orthogonally project $w_{2}$ onto $u_{1}$, and subtract that component to get a vector orthogonal to $u_{1}$! That is, take $$
u_{2} = w_{2} - \text{proj}_{u_{1}}w_{2}
$$In this case, $u_{1}$ and $u_{2}$ are now orthogonal and still spans $W$!
So, in our case, $$
\begin{align*}
u_{1} & = w_{1} = \begin{pmatrix}1\\1\\1\end{pmatrix}\\
u_{2} &= w_{2} - \text{proj}_{u_{1}}(w_{2}) = \begin{pmatrix}1\\2\\3\end{pmatrix} - \frac{\begin{pmatrix}1\\1\\1\end{pmatrix}\cdot\begin{pmatrix}1\\2\\3\end{pmatrix}}{\begin{pmatrix}1\\1\\1\end{pmatrix}\cdot\begin{pmatrix}1\\1\\1\end{pmatrix}}\begin{pmatrix}1\\1\\1\end{pmatrix} = \begin{pmatrix}-1\\0\\1\end{pmatrix}
\end{align*}
$$So an orthogonal basis for $W$ is $$
\begin{pmatrix}1\\1\\1\end{pmatrix},\begin{pmatrix}-1\\0\\1\end{pmatrix}.
$$
This is in fact _Gram-Schmidt process_ for 2 dimensional subspaces.
(b) To compute $\text{proj}_{W}(x)$, we use the orthogonal basis $u_{1},u_{2}$ we just computed. So $$
\begin{align*}
\text{proj}_{W}(x)& = \text{proj}_{u_{1}}(x)+\text{proj}_{u_{2}}(x) \\
& = \frac{u_{1} \cdot x}{u_{1}\cdot u_{1}} u_{1}+ \frac{u_{2}\cdot x}{u_{2}\cdot u_{2}}u_{2} \\
&= \frac{6}{3}\begin{pmatrix}1\\1\\1\end{pmatrix}+\frac{-2}{2}\begin{pmatrix}-1\\0\\1\end{pmatrix} = \begin{pmatrix}3\\2\\1\end{pmatrix}.\quad\blacklozenge
\end{align*}
$$
Before we move on from this problem, notice how $\text{proj}_{W}(x)$ equals $x$ itself in this example. This is because $x$ happens to be in $W$ to begin with! This is a happy accident, but don't expect it to be always this case. Let us record this:
> If $x \in W$, then $\text{proj}_{W}(x) = x$.
Let us also record the method of producing an orthogonal basis for a 2 dimensional subspace that we just did here:
>**Gram-Schmidt for 2-dimensional subspaces.**
>Let $W$ be a subspace spanned by a basis set $w_{1},w_{2}$. Then $u_{1},u_{2}$ is an orthogonal basis for $W$ by taking $$
\begin{align*}
u_{1} &= w_{1} \\
u_{2} &= w_{2}-\text{proj}_{u_{1}}(w_{2}).
\end{align*}
$$
One should draw a picture to see why this is the case. The higher dimensional process is analogous, which we will discuss more in details later, I will write it down here for now:
Given $w_{1},w_{2},\ldots,w_{k}$ any basis for $W$. Then we can obtain an orthogonal basis $u_{1},u_{2},\ldots,u_{k}$ of $W$ by computing $$
\begin{array}{l}
u_{1} &= w_{1} \\
u_{2} &= w_{2} - \text{proj}_{u_1}(w_{2}) \\
u_{3} &= w_{3} - \text{proj}_{u_{1}}(w_{3}) - \text{proj}_{u_{2}}(w_{3}) \\
u_{4} &= w_{4} - \text{proj}_{u_{1}}(w_{4}) - \text{proj}_{u_{2}}(w_{4}) - \text{proj}_{u_{3}}(w_{4}) \\
& \vdots
\end{array}
$$
## Properties of orthogonal projection and its matrix.
If we take any subspace $W$, then by the construction of $\text{proj}_{W}(x)$ we can see that $\text{proj}_{W}$ is a linear transformation (it is just a sum of a bunch of dot products). So one may be wondering what is the standard matrix for $\text{proj}_{W}$. Here it is:
> **Matrix of an orthogonal projection.**
> Let $u_{1},u_{2},\ldots,u_{k}$ be an _orthonormal basis_ of $W$, then $$
\text{proj}_{W} =QQ^{T}
$$where $Q =\begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}$ is a matrix with orthonormal columns consisting an orthonormal basis of $W$.
$\blacktriangleright$ Proof. Since $u_{i}\cdot u_{j}=\delta_{ij}$, as they form an orthonormal basis, we have $$
\begin{align*}
\text{proj}_{W}(x) &= \text{proj}_{u_{1}}(x) + \text{proj}_{u_{2}}(x)+\cdots + \text{proj}_{u_{k}}(x) \\
&=(u_{1}\cdot x)u_{1} + (u_{2}\cdot x)u_{2}+\cdots (u_{k}\cdot x)u_{k} \\
&= \begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}\begin{pmatrix}u_{1}\cdot x\\u_{2}\cdot x\\\vdots\\u_{k}\cdot x\end{pmatrix} \\
&= \begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}\begin{pmatrix}u_{1}^{T}x\\u_{2}^{T} x\\\vdots\\u_{k}^T x\end{pmatrix} \\
&= \begin{bmatrix}|&|&&|\\u_{1}&u_{2}&\cdots&u_{k}\\|&|&&|\end{bmatrix}\begin{pmatrix}\frac{\quad}{} \ u_{1}\frac{\quad}{}\\\frac{\quad}{} \ u_{2}\frac{\quad}{}\\\vdots\\\frac{\quad}{} \ u_{k}\frac{\quad}{}\end{pmatrix} \begin{pmatrix}|\\x\\|\end{pmatrix} \\
&=QQ^{T}x.
\end{align*}
$$Since for all $x$ we have $\text{proj}_{W}(x) = QQ^{T}x$, we therefore have the standard matrix $\text{proj}_{W}= QQ^{T}$. $\blacksquare$
**Example.**
In the above example where $W$ has orthogonal basis $$
\begin{pmatrix}1\\1\\1\end{pmatrix},\begin{pmatrix}-1\\0\\1\end{pmatrix},
$$we can construction the projection matrix for $\text{proj}_{W}$ by computing $QQ^{T}$, but we need an orthonormal basis. If we normalize each of the vectors in the orthogonal basis set, we get $$
\frac{1}{\sqrt{3}}\begin{pmatrix}1\\1\\1\end{pmatrix}, \frac{1}{\sqrt{2}}\begin{pmatrix}-1\\0\\1\end{pmatrix},
$$so we have $\text{proj}_{W}=QQ^{T}$ where $$
Q = \begin{pmatrix} \frac{1}{\sqrt{3}} &-\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{3}} &0\\ \frac{1}{\sqrt{3}}& \frac{1}{\sqrt{2}}\end{pmatrix}
$$Hence $$
QQ^{T} = \begin{pmatrix} \frac{5}{6} & \frac{1}{3} & -\frac{1}{6} \\
\frac{1}{3} & \frac{1}{3} & \frac{1}{3} \\
-\frac{1}{6} & \frac{1}{3} & \frac{5}{6}
\end{pmatrix}
$$And if we compute $\text{proj}_{W}(x)$, we can do $QQ^{T}x$ instead. Check that indeed $$
\text{proj}_{W}\begin{pmatrix}1\\2\\3\end{pmatrix} = QQ^{T} \begin{pmatrix}1\\2\\3\end{pmatrix} = \begin{pmatrix}1\\2\\3\end{pmatrix}
$$as we have discovered earlier. $\blacklozenge$
We can go a bit further. Since each $\text{proj}_{u}(x) = (u\cdot x)u = uu^{T}(x)$ when $u$ is an unit vector, we can also write
> Orthogonal projection as sum of rank 1 orthogonal projections.
> If $u_{1},u_{2},\ldots,u_{k}$ is an orthonormal basis of $W$, then the orthogonal projection $\text{proj}_{W}$ has matrix $$
\text{proj}_{W} = u_{1}u_{1}^{T}+ u_{2}u_{2}^{T}+\cdots u_{k}u_{k}^{T}.
$$
In any case, this settles this mystery for an $n\times k$ matrix $Q$ with orthonormal columns:
- $Q^{T}Q=I_{k\times k}$
- $QQ^{T}=\text{proj}_{W}$, where $W$ is the columnspace of $Q$.
Intuitively orthogonal projections are _idempotent_, that performing twice orthogonal projection is just the same as performing it once:
> For $W$ a subspace, $(\text{proj}_{W})^{2}=\text{proj}_{W}$
Indeed, if we compute $$
(QQ^{T})^{2}= QQ^{T} QQ^{T}=QIQ^{T}=QQ^{T}.
$$
And, we have
> For $W$ a subspace of $\mathbb R^{n}$, $$
\begin{array}{l}
\text{the range of \(\text{proj}_W\) is \(W\)} \\
\text{the kernel of \(\text{proj}_W\) is \(W^\perp\)}.
\end{array}
$$
Hence by rank-nullity theorem, we have the corollary that
> For $W$ a subspace of $\mathbb R^{n}$, $$
\dim W + \dim W^{\perp} = n.
$$
## Best approximation theorem.
This vector $\text{proj}_{W}(x)$, the orthogonal projection of a vector $x$ onto $W$, is a vector in $W$ that is closest to $x$ in distance. That is, it is a vector that best approximates $x$, if we are constrained in $W$. To make the more precise,
> **Best approximation theorem.**
> Let $W$ be a subspace in $\mathbb R^{n}$, and let $x$ be any vector in $\mathbb R^{n}$. Then for any vector $w \in W$, we always have $$
\Vert \text{proj}_{W}(x) - x\Vert \le \Vert w - x\Vert.
$$
> In other words, $\text{proj}_{W}(x)$ is the vector in $W$ closest to $x$.
$\blacktriangleright$ Proof. First we have $x = x_{||} + x_{\perp}=\text{proj}_{W}(x) + x_{\perp}$. Now we compute $$
\begin{align*}
\Vert w - x\Vert^{2} &= \Vert \underbrace{ w - \text{proj}_{W}(x)}_{\in W}-\underbrace{x_{\perp}}_{\in W^{\perp}}\Vert^{2} \\
&=\Vert w - \text{proj}_{W}(x) \Vert^{2} + \Vert x_{\perp}\Vert^{2}, \text{by Pythagorean theorem} \\
&\ge \Vert x_{\perp} \Vert^{2} \\
\implies \Vert w- x\Vert^{2} &\ge\Vert x-\text{proj}_{W}(x)\Vert^{2}
\end{align*}
$$
Hence this shows $\Vert\text{proj}_{W}(x)-x\Vert \le \Vert w-x\Vert$. $\blacksquare$